Concepts

Latency
  • Is how many cycles it takes for a result of one instruction to be available for the next.

Throughput
  • Is how often the instruction can issue.

Cold data
  • Not in cache.

Tips

  • You should always count cache lines, not bytes, for bandwidth computations.

  • Out Of Order execution hides latency and pipelines everything, but we can help it by disentangling dependency chains.

  • RAM is fast, but far from GPU. We can help it by laying out data for hardware prefetcher.